4 research outputs found

    Management of generic and multi-platform workflows for exploiting heterogeneous environments on e-Science

    Full text link
    Scientific Workflows (SWFs) are widely used to model applications in e-Science. In this programming model, scientific applications are described as a set of tasks that have dependencies among them. During the last decades, the execution of scientific workflows has been successfully performed in the available computing infrastructures (supercomputers, clusters and grids) using software programs called Workflow Management Systems (WMSs), which orchestrate the workload on top of these computing infrastructures. However, because each computing infrastructure has its own architecture and each scientific applications exploits efficiently one of these infrastructures, it is necessary to organize the way in which they are executed. WMSs need to get the most out of all the available computing and storage resources. Traditionally, scientific workflow applications have been extensively deployed in high-performance computing infrastructures (such as supercomputers and clusters) and grids. But, in the last years, the advent of cloud computing infrastructures has opened the door of using on-demand infrastructures to complement or even replace local infrastructures. However, new issues have arisen, such as the integration of hybrid resources or the compromise between infrastructure reutilization and elasticity, everything on the basis of cost-efficiency. The main contribution of this thesis is an ad-hoc solution for managing workflows exploiting the capabilities of cloud computing orchestrators to deploy resources on demand according to the workload and to combine heterogeneous cloud providers (such as on-premise clouds and public clouds) and traditional infrastructures (supercomputers and clusters) to minimize costs and response time. The thesis does not propose yet another WMS, but demonstrates the benefits of the integration of cloud orchestration when running complex workflows. The thesis shows several configuration experiments and multiple heterogeneous backends from a realistic comparative genomics workflow called Orthosearch, to migrate memory-intensive workload to public infrastructures while keeping other blocks of the experiment running locally. The running time and cost of the experiments is computed and best practices are suggested.Los flujos de trabajo científicos son comúnmente usados para modelar aplicaciones en e-Ciencia. En este modelo de programación, las aplicaciones científicas se describen como un conjunto de tareas que tienen dependencias entre ellas. Durante las últimas décadas, la ejecución de flujos de trabajo científicos se ha llevado a cabo con éxito en las infraestructuras de computación disponibles (supercomputadores, clústers y grids) haciendo uso de programas software llamados Gestores de Flujos de Trabajos, los cuales distribuyen la carga de trabajo en estas infraestructuras de computación. Sin embargo, debido a que cada infraestructura de computación posee su propia arquitectura y cada aplicación científica explota eficientemente una de estas infraestructuras, es necesario organizar la manera en que se ejecutan. Los Gestores de Flujos de Trabajo necesitan aprovechar el máximo todos los recursos de computación y almacenamiento disponibles. Habitualmente, las aplicaciones científicas de flujos de trabajos han sido ejecutadas en recursos de computación de altas prestaciones (tales como supercomputadores y clústers) y grids. Sin embargo, en los últimos años, la aparición de las infraestructuras de computación en la nube ha posibilitado el uso de infraestructuras bajo demanda para complementar o incluso reemplazar infraestructuras locales. No obstante, este hecho plantea nuevas cuestiones, tales como la integración de recursos híbridos o el compromiso entre la reutilización de la infraestructura y la elasticidad, todo ello teniendo en cuenta que sea eficiente en el coste. La principal contribución de esta tesis es una solución ad-hoc para gestionar flujos de trabajos explotando las capacidades de los orquestadores de recursos de computación en la nube para desplegar recursos bajo demando según la carga de trabajo y combinar proveedores de computación en la nube heterogéneos (privados y públicos) e infraestructuras tradicionales (supercomputadores y clústers) para minimizar el coste y el tiempo de respuesta. La tesis no propone otro gestor de flujos de trabajo más, sino que demuestra los beneficios de la integración de la orquestación de la computación en la nube cuando se ejecutan flujos de trabajo complejos. La tesis muestra experimentos con diferentes configuraciones y múltiples plataformas heterogéneas, haciendo uso de un flujo de trabajo real de genómica comparativa llamado Orthosearch, para traspasar cargas de trabajo intensivas de memoria a infraestructuras públicas mientras se mantienen otros bloques del experimento ejecutándose localmente. El tiempo de respuesta y el coste de los experimentos son calculados, además de sugerir buenas prácticas.Els fluxos de treball científics són comunament usats per a modelar aplicacions en e-Ciència. En aquest model de programació, les aplicacions científiques es descriuen com un conjunt de tasques que tenen dependències entre elles. Durant les últimes dècades, l'execució de fluxos de treball científics s'ha dut a terme amb èxit en les infraestructures de computació disponibles (supercomputadors, clústers i grids) fent ús de programari anomenat Gestors de Fluxos de Treballs, els quals distribueixen la càrrega de treball en aquestes infraestructures de computació. No obstant açò, a causa que cada infraestructura de computació posseeix la seua pròpia arquitectura i cada aplicació científica explota eficientment una d'aquestes infraestructures, és necessari organitzar la manera en què s'executen. Els Gestors de Fluxos de Treball necessiten aprofitar el màxim tots els recursos de computació i emmagatzematge disponibles. Habitualment, les aplicacions científiques de fluxos de treballs han sigut executades en recursos de computació d'altes prestacions (tals com supercomputadors i clústers) i grids. No obstant açò, en els últims anys, l'aparició de les infraestructures de computació en el núvol ha possibilitat l'ús d'infraestructures sota demanda per a complementar o fins i tot reemplaçar infraestructures locals. No obstant açò, aquest fet planteja noves qüestions, tals com la integració de recursos híbrids o el compromís entre la reutilització de la infraestructura i l'elasticitat, tot açò tenint en compte que siga eficient en el cost. La principal contribució d'aquesta tesi és una solució ad-hoc per a gestionar fluxos de treballs explotant les capacitats dels orquestadors de recursos de computació en el núvol per a desplegar recursos baix demande segons la càrrega de treball i combinar proveïdors de computació en el núvol heterogenis (privats i públics) i infraestructures tradicionals (supercomputadors i clústers) per a minimitzar el cost i el temps de resposta. La tesi no proposa un gestor de fluxos de treball més, sinó que demostra els beneficis de la integració de l'orquestració de la computació en el núvol quan s'executen fluxos de treball complexos. La tesi mostra experiments amb diferents configuracions i múltiples plataformes heterogènies, fent ús d'un flux de treball real de genòmica comparativa anomenat Orthosearch, per a traspassar càrregues de treball intensives de memòria a infraestructures públiques mentre es mantenen altres blocs de l'experiment executant-se localment. El temps de resposta i el cost dels experiments són calculats, a més de suggerir bones pràctiques.Carrión Collado, AA. (2017). Management of generic and multi-platform workflows for exploiting heterogeneous environments on e-Science [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86179TESI

    Análisis y caracterización de trabajos BLAST para la planificación eficiente en entorno Grid y Supercomputación

    Full text link
    La ejecución de experimentos con BLAST, una de las herramientas bioinformáticas más utilizadas, requiere recursos computacionales que exceden los recursos de la mayoría de centros de investigación. Esta tesis se centra en desarrollar técnicas para la ejecución eficiente de estos experimentos tanto en infraestructuras Grid como en Supercomputación.Carrión Collado, AA. (2010). Análisis y caracterización de trabajos BLAST para la planificación eficiente en entorno Grid y Supercomputación. http://hdl.handle.net/10251/1385

    Managing Workflows on top of a Cloud Computing Orchestrator for using heterogeneous environments on e-Science

    Full text link
    [EN] Scientific workflows (SWFs) are widely used to model processes in e-Science. SWFs are executed by means of workflow management systems (WMSs), which orchestrate the workload on top of computing infrastructures. The advent of cloud computing infrastructures has opened the door of using on-demand infrastructures to complement or even replace local infrastructures. However, new issues have arisen, such as the integration of hybrid resources or the compromise between infrastructure reutilisation and elasticity. In this article, we present an ad hoc solution for managing workflows exploiting the capabilities of cloud orchestrators to deploy resources on demand according to the workload and to combine heterogeneous cloud providers (such as on-premise clouds and public clouds) and traditional infrastructures (clusters) to minimise costs and response time. The work does not propose yet another WMS but demonstrates the benefits of the integration of cloud orchestration when running complex workflows. The article shows several configuration experiments from a realistic comparative genomics workflow called Orthosearch, to migrate memory-intensive workload to public infrastructures while keeping other blocks of the experiment running locally. The article computes running time and cost suggesting best practices.This paper wants to acknowledge the support of the EUBrazilCC project, funded by the European Commission (STREP 614048) and the Brazilian MCT/CNPq N. 13/2012, for the use of its infrastructure. The authors would like also to thank the Spanish 'Ministerio de Economia y Competitividad' for the project 'Clusters Virtuales Elasticos y Migrables sobre Infraestructuras Cloud Hibridas' with reference TIN2013-44390-R.Carrión Collado, AA.; Caballer Fernández, M.; Blanquer Espert, I.; Kotowski, N.; Jardim, R.; Dávila, AMR. (2017). Managing Workflows on top of a Cloud Computing Orchestrator for using heterogeneous environments on e-Science. International Journal of Web and Grid Services. 13(4):375-402. doi:10.1504/IJWGS.2017.10003225S37540213

    Supercomputing and grid computing on the verification of covering arrays

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-012-0763-0The Covering Arrays (CAs) are mathematical objects with minimal coverage and maximum cardinality that are a good tool for the design of experiments. A covering array is an Nxk matrix over an alphabet v s.t. each Nxk subset contains at least one time each combination from {0,1,...,v ,1}t, given a positive integer value t. The process of ensuring that a CA contains each of the v t combinations is called verification of CA. In this paper, we present an algorithm for CA verification and its implementation details in three different computation paradigms: (a) sequential approach (SA); (b) parallel approach (PA); and (c) Grid approach (GA). Four different PAs were compared in their performance of verifying a matrix as a CA; the PA with the best performance was included in a different experimentation where the three paradigms, SA, PA, and GA were compared in a benchmark composed by 45 possible CA instances. The results showed the limitations of the different paradigms when solving the verification of CA problem, and points out the necessity of a Grid approach to solve the problem when the size of a CA grows. © 2012 Springer Science+Business Media, LLC.The authors thankfully acknowledge the computer resources and assistance provided by Spanish Supercomputing Network (TIRANT-UV). This research work was partially funded by the following projects: CONACyT 58554, Calculo de Covering Arrays; 51623 Fondo Mixto CONACyT y Gobierno del Estado de Tamaulipas.Avila George, H.; Torres Jimenez, J.; Rangel Valdez, N.; Carrión Collado, AA.; Hernández García, V. (2012). Supercomputing and grid computing on the verification of covering arrays. Journal of Supercomputing. 62(2):916-945. doi:10.1007/s11227-012-0763-0S916945622Avila-George H, Torres-Jimenez J, Hernández V, Rangel-Valdez N (2010) Verification of general and cyclic covering arrays using Grid computing. In: Proceedings of the 3rd international conference on data management in grid and peer-to-peer systems, GLOBE 2010, Bilbao, Spain, 30 August–3 September. Lecture notes in computer science, vol 6265. Springer, Berlin, pp 112–123. doi: 10.1007/978-3-642-15108-8_10Bryce RC, Colbourn CJ (2007) The density algorithm for pairwise interaction testing. Softw Test Verif Reliab 17(3):159–182. doi: 10.1002/stvr.365Burr K, Young W (1998) Combinatorial test techniques: table-based automation, test generation and code coverage. In: Proceedings of the international conference on software testing, analysis, and review—STAR, pp 503–513. West, 1998Bush KA (1952) Orthogonal arrays of index unity. Ann Math Stat 23(3):426–434. doi: 10.1214/aoms/1177729387Calvagna A, Gargantini A, Tramontana E (2009) Building T-wise combinatorial interaction test suites by means of grid computing. In: Proceedings of the 18th IEEE international workshops on enabling technologies: infrastructures for collaborative enterprises—WETICE 2009. IEEE Comput Soc, Los Alamitos, pp 213–218. doi: 10.1109/WETICE.2009.52Cawse JN (2003) Experimental design for combinatorial and high throughput materials development. Wiley, New YorkCheng C (2007) The test suite generation problem: optimal instances and their implications. Discrete Appl Math 155:1943–1957. doi: 10.1016/j.dam.2007.04.010Cohen DM, Dalal SR, Parelius J, Patton GC (1996) The combinatorial design approach to automatic test generation. IEEE Softw 13(5):83–88. doi: 10.1109/52.536462Cohen MB, Colbourn CJ, Ling ACH (2003) Augmenting simulated annealing to build interaction test suites. In: Proceedings of the 14th international symposium on software reliability engineering—ISSRE 2003. IEEE Comput Soc, Los Alamitos, pp 394–405. http://doi.ieeecomputersociety.org/10.1109/ISSRE.2003.1251061Colbourn CJ (2004) Combinatorial aspects of covering arrays. Matematiche 59(1, 2):125–172Colbourn CJ (2011) Covering array tables for t=2,3,4,5,6. URL: http://www.public.asu.edu/~ccolbou/src/tabby/catable.html . Accessed on April 20Colbourn CJ, Kéri G (2009) Binary covering arrays and existentially closed graphs. In: Proceedings of the 2nd international workshop on coding and cryptology—IWCC 2009. Lecture notes in computer science, vol 5557. Springer, Berlin, pp 22–33. doi: 10.1007/978-3-642-01877-0_3Colbourn CJ, Ling ACH (2009) A recursive construction for perfect hash families. J Math Cryptol 3(4):291–306. doi: 10.1515/JMC.2009.018Colbourn CJ, Martirosyan SS, Mullen GL, Shasha D, Sherwood GB, Yucas JL (2006) Products of mixed covering arrays of strength two. J Comb Des 12(2):124–138. doi: 10.1002/jcd.20065Colbourn CJ, Torres-Jimenez J (2010) Heterogeneous hash families and covering arrays. In: Error-correcting codes, finite geometries and cryptography. Contemporary mathematics, vol 523. pp 3–15. ISBN-10 0-8218-4956-5DIANE (2011) Distributed analysis environment. URL: http://it-proj-diane.web.cern.ch/it-proj-diane/ . Accessed on June 6Foster I, Kesselman C (1999) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, San MateoSeroussi NBG (1988) Vector sets for exhaustive testing of logic circuits. IEEE Trans Inf Theory 34:513–522Gonzalez-Hernandez L, Rangel-Valdez N, Torres-Jimenez J (2010) Construction of mixed covering arrays of variable strength using a tabu search approach. In: Proceedings of the 4th international conference on combinatorial optimization and applications, COCOA, 2010. Lecture notes in computer science, vol 6508. Springer, Berlin, pp 51–64. doi: 10.1007/978-3-642-17458-2_6Hedayat AS, Sloane NJA, Stufken J (1999) Orthogonal arrays: theory and applications. Springer, BerlinKatona GOH (1973) Two applications (for search theory and truth functions) of Sperner type theorems. Period Math Hung 3(1–2):19–26. doi: 10.1007/BF02018457Kleitman DJ, Spencer J (1973) Families of k-independent sets. Discrete Math 6(3):255–262. doi: 10.1016/0012-365X(73)90098-8Kuhn R, Lei Y, Kacker R (2008) Practical combinatorial testing: beyond pairwise. IT Prof 10(3):19–23. doi: 10.1109/MITP.2008.54Lawrence J, Kacker R, Lei Y, Kuhn D, Forbes M (2011) A survey of binary covering arrays. Electron J Comb 18(1):84Lei Y, Kacker R, Kuhn DR, Okun V, Lawrence J (2007) IPOG: a general strategy for t-way software testing. In: Proceedings of the 14th annual IEEE international conference and workshops on the engineering of computer-based systems—ECBS 2007. IEEE Comput Soc, Los Alamitos, pp 549–556. doi: 10.1109/ECBS.2007.47Martinez-Pena J, Torres-Jimenez J, Rangel-Valdez N, Avila-George H (2010) A heuristic approach for constructing ternary covering arrays using trinomial coefficients. In: Proceedings of the 12th Ibero-American conference on artificial intelligence—IBERAMIA 2010. Lecture notes in computer science, vol 6433. Springer, Berlin, pp 572–581. doi: 10.1007/978-3-642-16952-6_58Martirosyan SS, Colbourn CJ (2005) Recursive constructions of covering arrays. Bayreuth Math Schr 74:266–275McDowell AG (2011) All-pairs testing. URL: http://www.mcdowella.demon.co.uk/allPairs.html . Accessed on June 21Moscicki J, Brochu F, Ebke J, Egede U, Elmsheuser J, Harrison K, Jones R, Lee H, Liko D, Maier A, Muraru A, Patrick G, Pajchel K, Reece W, Samset B, Slater M, Soroko A, Tan C, van der Ster D, Williams M (2009) Ganga: a tool for computational-task management and easy access to grid resources. Comput Phys Commun 180(11):2303–2316. doi: 10.1016/j.cpc.2009.06.016Moura L, Stardom J, Stevens B, Williams A (2003) Covering arrays with mixed alphabet sizes. J Comb Des 11(6):413–432. doi: 10.1002/jcd.10059National Institute of Standards and Technology (2011) NIST covering array tables. URL: http://math.nist.gov/coveringarrays/ . Accessed on April 20Nurmela KJ (2004) Upper bounds for covering arrays by tabu search. Discrete Appl Math 138:143–152. doi: 10.1016/S0166-218X(03)00291-9Pacini F (2011) Job description language howto. URL: http://server11.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2-Document.pdf . Accessed on October 10Phadke MS (1995) Quality engineering using robust design. Prentice Hall, New YorkRényi A (1971) Foundations of probability. Wiley, New YorkShasha DE, Kouranov AY, Lejay LV, Chou MF, Coruzzi GM (2001) Using combinatorial design to study regulation by multiple input signals: a tool for parsimony in the post-genomics era. Plant Physiol 127(4):1590–1594. doi: 10.1104/pp.010683Sherwood GB (2008) Optimal and near-optimal mixed covering arrays by column expansion. Discrete Math 308(24):6022–6035. doi: 10.1016/j.disc.2007.11.021Sherwood GB (2011) On the construction of orthogonal arrays and covering arrays using permutation groups. URL: http://testcover.com/pub/background/cover.htm . Accessed on June 20Shiba T, Tsuchiya T, Kikuno T (2004) Using artificial life techniques to generate test cases for combinatorial testing. In: Proceedings of the 28th annual international computer software and applications conference, vol 01, COMPSAC 2004. IEEE Comput Soc, Los Alamitos, pp 72–77. doi: 10.1109/CMPSAC.2004.1342808Torres-Jimenez J, Avila-George H, Rangel-Valdez N, Gonzalez-Hernandez L (2012) Construction of orthogonal arrays of index unity using logarithm tables for galois fields. In Cryptography. InTech, pp 71–90. ISBN 978-953-51-0179-6Torres-Jimenez J, De Alfonso C, Hernández V (2004) Computation of ternary covering arrays using a grid. In: Proceedings of the second Asian applied computing conference—AACC 2004. Lecture notes in computer science, vol 3285. Springer, Berlin, pp 240–246. doi: 10.1007/978-3-540-30176-9_31Torres-Jimenez J, Rangel-Valdez N, Gonzalez-Hernandez AL, Avila-George H (2010) Construction of logarithm tables for Galois fields. Int J Math Educ Sci Technol 42(1):91–102. doi: 10.1080/0020739X.2010.510215Torres-Jimenez J, Rodriguez-Tello E (2012) New bounds for binary covering arrays using simulated annealing. Inf Sci 185(1):137–152. doi: 10.1016/j.ins.2011.09.020Tung Y, Aldiwan WS (2000) Automating test case generation for the new generation mission software system. In: Proceedings of the IEEE aerospace conference, vol 1. IEEE Press, New York, pp 431–437. doi: 10.1109/AERO.2000.879426Vadde K, Syrotiuk V (2004) Factor interaction on service delivery in mobile ad hoc networks. IEEE J Sel Areas Commun 22(7):1335–1346. doi: 10.1109/JSAC.2004.829351Williams AW (2000) Determination of test configurations for pair-wise interaction coverage. In: Proceedings of the IFIP TC6/WG6.1 13th International conference on testing communicating systems: tools and techniques—TestCom. Kluwer, Norwell, pp 59–74Williams AW, Probert RL (1996) A practical strategy for testing pair-wise coverage of network interfaces. In: Proceedings of the seventh international symposium on software reliability engineering—ISSRE. IEEE Comput Soc, Los Alamitos, pp 246–256. doi: 10.1109/ISSRE.1996.558835Yilmaz C, Cohen MB, Porter AA (2006) Covering arrays for efficient fault characterization in complex configuration spaces. IEEE Trans Softw Eng 32(1):20–34. doi: 10.1109/TSE.2006.8Younis M, Zamli K, Isa N (2008) A strategy for grid based t-way test data generation. In: First international conference on distributed framework and applications—DFmA 2008, pp 73–78. doi: 10.1109/ICDFMA.2008.478441
    corecore